Building a hedonic pricing models to explain factors affecting the resale prices of Singapore public housing.
Housing is essential for everyone and is a major investment for many people in Singapore. Structural and location factor affects the price of housing. In this exercise, we are interested to find out which factor affects the price of 4-room HDB flat transacted from 1st January 2019 to 30th September 2020. We will be building a hedonic pricing model using GWR methods in the exercise.
We need to first set up the environment by install the relevant packages and set up the token.
The following code chunk install the following packages:
packages = c('readr', 'tmap', 'sf', 'onemapsgapi', 'httr', 'dplyr', 'stringr', 'jsonlite', 'tidyr', 'sp', 'dummies', 'geojsonio', 'ggplot2', 'ggpubr', 'GWmodel', 'corrplot', 'olsrr', 'spdep')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Next, we set up the token for onemapsgapi. The code is not shown because the information is sensitive.
We will be preparing the data in this section. Note that during the data preparation, files will be written in data/newaspatial as a backup. After data preparation, these backup files will be deleted deleted due to Github’s file limit.
The following code chunk uses read_csv function of readr package to import the hdb resales data and display the data.
hdbresale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
hdbresale
The data contains multiple rows but our focus is only on four-room flat with transaction period from 1st January 2019 to 30th September 2020.
The following code chunk filters the four-room flat data from Jan 2019 to September 2020 using filter function of base package.
Then, we created a new column using mutate function of dplyr package for the following column:
hdbresale_filtered <- hdbresale %>%
filter(month %in% c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05", "2019-06", "2019-07", "2019-08", "2019-09", "2019-10", "2019-11", "2019-12","2020-01", "2020-02", "2020-03", "2020-04", "2020-05", "2020-06", "2020-07", "2020-08", "2020-09")) %>%
filter(flat_type == "4 ROOM")
hdbresale_new <- hdbresale_filtered %>%
mutate(hdbresale_filtered, address = paste(as.character(block),as.character(street_name))) %>%
mutate(hdbresale_filtered, storey_range_lower = str_sub(storey_range, 0, 2)) %>%
mutate(hdbresale_filtered, storey_range_upper = str_sub(storey_range, -3, -1)) %>%
mutate(hdbresale_filtered, remaining_lease_year = str_sub(remaining_lease, 0, 2)) %>%
select(month, town, address, block, street_name, flat_type, storey_range, storey_range_lower, storey_range_upper, floor_area_sqm, flat_model, lease_commence_date, remaining_lease, remaining_lease_year, resale_price)
hdbresale_new
After filtering, We can see that from Jan 2019 to September 2020, there are 15901 transactions for four-room flat in Singapore.
Next, we get the get long with the address provided.
The follow code chunk:
for (i in 1:nrow(hdbresale_new)) {
address <- hdbresale_new[i,'address']
url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', address, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
encoded_url <- URLencode(url)
getcoordinate <- GET(encoded_url)
jsonParsed <- content(getcoordinate,as="parsed")
if (length(jsonParsed$results) > 0) {
hdbresale_new[i,'LATITUDE'] = jsonParsed$results[[1]]$LATITUDE
hdbresale_new[i,'LONGITUDE'] = jsonParsed$results[[1]]$LONGITUDE
}
}
The following code chunk use is.na function of base package and sum to calculate the total number of NA is there is any.
There are 26 rows with missing lat long. We will take a look at which areas are the affected rows
The following code chunk filters the rows with missing latlong
From the extracted missing rows, the common location with missing lat long is ST. GEORGE.
We will next assign the lat long to these rows as long as the address is “ST. GEORGE” if there are any result
Similar to 4.1.2, this time we set the address as default “ST.GEORGE” and assign to each row as long as there is result
for (i in 1:nrow(hdbresaleNA)) {
address = "ST. GEORGE"
url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', address, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
encoded_url <- URLencode(url)
getcoordinate <- GET(encoded_url)
jsonParsed <- content(getcoordinate,as="parsed")
if (length(jsonParsed$results) > 0) {
hdbresaleNA[i,'LATITUDE'] = jsonParsed$results[[1]]$LATITUDE
hdbresaleNA[i,'LONGITUDE'] = jsonParsed$results[[1]]$LONGITUDE
}
}
The following code chunk check if there are still any missing value
hdbresaleNA
There are no missing value. However, the lat long for the 26 points are all the same.
In order to avoid having overlapping points, the following code chunk use jitter function of base package so that the lat long are distinct. factor is set to random small number so that the jitter would not go very far away from the area.
hdbresaleNA$LATITUDE <- jitter(as.numeric(hdbresaleNA$LATITUDE), factor = 0.00123)
hdbresaleNA$LONGITUDE <- jitter(as.numeric(hdbresaleNA$LONGITUDE), factor = 0.00123)
hdbresaleNA
The lat long are now different.
The following code chunk filters the rows with latlong using drop_na function of tidyr to drop rows with missing LATITUDE values
hdbresaleNOTNA <- hdbresale_new %>%
drop_na(LATITUDE)
hdbresaleNOTNA
The following code chunk combines hdbresaleNA and hdbresaleNOTNA using rbind function of base package.
hdbresale_latlong <- rbind(hdbresaleNA, hdbresaleNOTNA)
glimpse(hdbresale_latlong)
We have 15901 rows which is same as the number of rows of the initial dataset.
The following code chunk converts the dataset with latlong decimal degree to metres coordinates, and transform the crs to Singapore’s projection, 3414.
hdbresale_latlongm <- st_as_sf(hdbresale_latlong,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
The following code chunk plots the points to see if the data is correct using functions of tmap package.
tmap_mode("view")
tm_shape(hdbresale_latlongm)+
tm_dots()
tmap_mode("plot")
The following code chunk export data with sf as csv so that we do not have to re-run the time-consuming process above again.
write.csv(hdbresale_latlong, "data/newaspatial/hdbresale_latlongonly.csv", row.names = FALSE)
The following code chunk import the latest hdb data so that the subsequent calculated data can added.
hdb_updated <- read_csv("data/aspatial/hdbresale_latlongonly.csv")
Since CBD is a huge area, we will first define the lat long for CBD, taken from Google. (Latitude : 1.2801, Longitude : 103.8509)
The following code chunk:
This process repeated a few times. This is because the onemapsgapi may return NA value for certain rows due to the request limit. Note that the first data was hdbresale_latlononly, and it was replaced by the data with missing distance, hdbresale_latlononly_CBD_NA_3, because of the repetition of this process.
for (i in 1:nrow(hdbresale_latlononly_CBD_NA3)) {
startlat <- hdbresale_latlononly_CBD_NA3[i,'LATITUDE']
startlong <- hdbresale_latlononly_CBD_NA3[i,'LONGITUDE']
start = gsub(' ', '', paste(startlat, ',', startlong))
end = '1.2801,103.8509'
routeType = 'drive'
url = gsub(' ', '', paste('https://developers.onemap.sg/privateapi/routingsvc/route?start=', start, '&end=', end, '&routeType=', routeType, '&token=', token))
getproximitytocbd <- GET(url)
jsonParsed <- content(getproximitytocbd,as="parsed")
if (length(jsonParsed$route_summary) > 0) {
hdbresale_latlononly_CBD_NA3[i,'PROX_CBD'] = jsonParsed$route_summary$total_distance/1000
}
}
The following code chunk write a copy of the csv for backup purposes.
write.csv(hdbresale_latlononly_CBD_NA3, "data/newaspatial/hdbresale_PROXCBD_fifthrun.csv", row.names = FALSE)
The following code chunk filter the rows with distance NA, and repeat the above code chunk to obtain distance for remaining rows with NA value.
After a few iteration, hdbresale_latlononly_CBD_NA4 no long have any data which means all rows have PROX_CBD distance.
The following code chunk filter the rows with valid distance value and write a copy of the csv in the file path
hdbresale_latlononly_CBD_NONA5 <- hdbresale_latlononly_CBD_NA3 %>%
drop_na(PROX_CBD)
hdbresale_latlononly_CBD_NONA5
write.csv(hdbresale_latlononly_CBD_NONA5, "data/newaspatial/hdbresale_proxcbd5.csv", row.names = FALSE)
The following code chunk rbind all the dataset after getting the PROX_CBD of every row from the repetitive steps of 4.2.1 - 4.2.3. The final data with PROX_CBD is written as csv for backup
Data to bind includes:
The following code chunk combines PROX_CBD with the updated data, hdb_updated using cbind function of base package which combines column.
hdb_updated <- cbind(hdb_updated, PROX_CBD)
The following code chunk import eldercare shapefile data using st_read.
eldercare_sf <- st_read(dsn = "data/independentvar/extracted", layer="ELDERCARE")
There are 133 rows of eldercare. The projected CRS is already in SVY21.
The following code chunk:
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
elderlycare_df <- eldercare_sf$geometry
elderlycare <- st_coordinates(elderlycare_df)
dist_hdb_elderlycare <- spDists(hdb, elderlycare, longlat=FALSE)
write.csv(dist_hdb_elderlycare, "data/newaspatial/hdbtoeldercare.csv", row.names = FALSE)
The following code chunk:
dist_hdb_elderlycare_df <- data.frame(dist_hdb_elderlycare)
dist_hdb_elderlycare_min <- dist_hdb_elderlycare_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X133)))
PROX_ELDERLYCARE <- dist_hdb_elderlycare_min$Min/1000
The following code chunk combines PROX_ELDERLYCARE with the updated data, hdb_updated, via cbind function of base package.
hdb_updated <- cbind(hdb_updated, PROX_ELDERLYCARE)
The following code chunk writes the updated data for backup purposes.
write.csv(hdb_updated, "data/newaspatial/hdbupdated1.csv", row.names = FALSE)
The following code chunk import and assign the projection crs of Singapore 3414.
hawkercentre_sf <- st_read("data/independentvar/extracted/hawker-centres-geojson.geojson") %>%
st_transform(crs = 3414)
There are 125 rows of Hawker Centre.
The following code chunk is similar to 4.3.2. The only difference is that we only use the hawkercentre’s first and second column (X and Y coordinate) since the st_coordinates also include the Z coodinate column in the data.
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
hawkercentre_df <- hawkercentre_sf$geometry
hawkercentre <- st_coordinates(hawkercentre_df)
hawkercentre <- hawkercentre[,c(1,2)]
dist_hdb_hawkercentre <- spDists(hdb, hawkercentre, longlat=FALSE)
The following code chunk is similar to 4.3.3.
write.csv(dist_hdb_hawkercentre, "data/newaspatial/hdbtohawker.csv", row.names = FALSE)
The following code chunk is similar to 4.3.4.
dist_hdb_hawkercentre_df <- data.frame(dist_hdb_hawkercentre)
dist_hdb_hawkercentre_min <- dist_hdb_hawkercentre_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X125)))
PROX_HAWKER <- dist_hdb_hawkercentre_min$Min/1000
The following code chunk is similar to 4.3.5
hdb_updated <- cbind(hdb_updated, PROX_HAWKER)
write.csv(hdb_updated, "data/newaspatial/hdbupdated2.csv", row.names = FALSE)
The following code chunk import the MRT shapefile using st_read.
mrtlrt_sf <- st_read(dsn = "data/independentvar/extracted", layer="MRTLRTStnPtt")
There are 185 rows of MRTLRT data. The project CRS is already SVY21.
The following code chunk is similar to 4.3.2.
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
mrtlrt_df <- mrtlrt_sf$geometry
mrtlrt <- st_coordinates(mrtlrt_df)
dist_hdb_mrtlrt <- spDists(hdb, mrtlrt, longlat=FALSE)
The following code chunk is similar to 4.3.3
write.csv(dist_hdb_mrtlrt, "data/newaspatial/hdbtomrtlrt.csv", row.names = FALSE)
The following code chunk is similar to 4.3.4
dist_hdb_mrtlrt_df <- data.frame(dist_hdb_mrtlrt)
dist_hdb_mrtlrt_min <- dist_hdb_mrtlrt_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X185)))
PROX_MRTLRT <- dist_hdb_mrtlrt_min$Min/1000
The following code chunk is similar to 4.3.5
hdb_updated <- cbind(hdb_updated, PROX_MRTLRT)
The following code chunk is similar to 4.3.6
write.csv(hdb_updated, "data/newaspatial/hdbupdated3.csv", row.names = FALSE)
The following code chunk import and assign the projection crs of Singapore 3414.
parks_sf <- st_read("data/independentvar/extracted/parks-geojson.geojson") %>%
st_transform(crs = 3414)
The following code chunk is similar to 4.4.2
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
parks_df <- parks_sf$geometry
parks <- st_coordinates(parks_df)
parks <- parks[,c(1,2)]
dist_hdb_parks <- spDists(hdb, parks, longlat=FALSE)
The following code chunk is similar to 4.4.3
write.csv(dist_hdb_parks, "data/newaspatial/hdbtoparks.csv", row.names = FALSE)
The following code chunk is similar to 4.4.4
dist_hdb_parks_df <- data.frame(dist_hdb_parks)
dist_hdb_parks_min <- dist_hdb_parks_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X350)))
PROX_PARKS <- dist_hdb_parks_min$Min/1000
The following code chunk is similar to 4.4.5
hdb_updated <- cbind(hdb_updated, PROX_PARKS)
The following code chunk is similar to 4.4.6
write.csv(hdb_updated, "data/newaspatial/hdbupdated4.csv", row.names = FALSE)
The following code chunk import data and assign the projection crs of Singapore 3414.
supermarkets_sf <- st_read("data/independentvar/extracted/supermarkets-geojson.geojson") %>%
st_transform(crs = 3414)
There are 526 rows of supermarket.
The following code chunk is similar to 4.4.2
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
supermarkets_df <- supermarkets_sf$geometry
supermarkets <- st_coordinates(supermarkets_df)
supermarkets <- supermarkets[,c(1,2)]
dist_hdb_supermarkets <- spDists(hdb, supermarkets, longlat=FALSE)
The following code chunk is similar to 4.4.3
write.csv(dist_hdb_supermarkets, "data/newaspatial/hdbtosupermarkets.csv", row.names = FALSE)
The following code chunk is similar to 4.4.4
dist_hdb_supermarkets_df <- data.frame(dist_hdb_supermarkets)
dist_hdb_supermarkets_min <- dist_hdb_supermarkets_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X526)))
PROX_SUPERMARKETS <- dist_hdb_supermarkets_min$Min/1000
The following code chunk is similar to 4.4.5
hdb_updated <- cbind(hdb_updated, PROX_SUPERMARKETS)
The following code chunk is similar to 4.4.6
write.csv(hdb_updated, "data/newaspatial/hdbupdated5.csv", row.names = FALSE)
The following code chunk imports data and assigned projection crs of Singapore 3414.
clinics_sf <- st_read("data/independentvar/extracted/chas-clinics-geojson.geojson") %>%
st_transform(crs = 3414)
There are 1167 rows of CHAS clinics.
The following code chunk is similar to 4.4.2
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
clinics_df <- clinics_sf$geometry
clinics <- st_coordinates(clinics_df)
clinics <- clinics[,c(1,2)]
dist_hdb_clinics <- spDists(hdb, clinics, longlat=FALSE)
The following code chunk is similar to 4.4.3
write.csv(dist_hdb_clinics, "data/newaspatial/hdbtoclinics.csv", row.names = FALSE)
The following code chunk is similar to 4.4.4
dist_hdb_clinics_df <- data.frame(dist_hdb_clinics)
dist_hdb_clinics_min <- dist_hdb_clinics_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X1167)))
PROX_CLINICS <- dist_hdb_clinics_min$Min/1000
The following code chunk is similar to 4.4.5
hdb_updated <- cbind(hdb_updated, PROX_CLINICS)
The following code chunk is similar to 4.4.6
write.csv(hdb_updated, "data/newaspatial/hdbupdated6.csv", row.names = FALSE)
The following code chunks import data and assigned projected crs of 3414.
kindergartens_sf <- st_read("data/independentvar/extracted/kindergartens-geojson.geojson") %>%
st_transform(crs = 3414)
There are 448 rows of kindergartens.
The following code chunk is similar to 4.4.2
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
kindergartens_df <- kindergartens_sf$geometry
kindergartens <- st_coordinates(kindergartens_df)
kindergartens <- kindergartens[,c(1,2)]
dist_hdb_kindergartens <- spDists(hdb, kindergartens, longlat=FALSE)
The following code chunk is similar to 4.4.3
write.csv(dist_hdb_kindergartens, "data/newaspatial/hdbtokindergartens.csv", row.names = FALSE)
The following code chunk is similar to 4.4.4
dist_hdb_kindergartens_df <- data.frame(dist_hdb_kindergartens)
dist_hdb_kindergartens_min <- dist_hdb_kindergartens_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X448)))
PROX_KINDERGARTENS <- dist_hdb_kindergartens_min$Min/1000
The following code chunk calculates the number of points with distance lower than 350m, which means number of points within 350m, using rowSum function of base package.
dist_hdb_kindergartens_df$num_within_350m <- rowSums(dist_hdb_kindergartens_df <= 350)
NUM_KINDERGARDEN_350M <- dist_hdb_kindergartens_df$num_within_350m
The following code chunk combines PROX_KINDERGARTENS and NUM_KINDERGARDEN_350M with the most updated data, hdb_updated, via cbind function of base package.
hdb_updated <- cbind(hdb_updated, PROX_KINDERGARTENS, NUM_KINDERGARDEN_350M)
The following code chunk is similar to 4.4.6
write.csv(hdb_updated, "data/newaspatial/hdbupdated7.csv", row.names = FALSE)
The following code chunks import data and assigned projected crs of 3414.
childcare_sf <- st_read("data/independentvar/extracted/child-care-services-geojson.geojson") %>%
st_transform(crs = 3414)
There are 1545 rows of childcare centres.
The following code chunk is similar to 4.4.2
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
childcare_df <- childcare_sf$geometry
childcare <- st_coordinates(childcare_df)
childcare <- childcare[,c(1,2)]
dist_hdb_childcare <- spDists(hdb, childcare, longlat=FALSE)
The following code chunk is similar to 4.4.3
write.csv(dist_hdb_childcare, "data/newaspatial/hdbtochildcare.csv", row.names = FALSE)
The following code chunk is similar to 4.4.4
dist_hdb_childcare_df <- data.frame(dist_hdb_childcare)
dist_hdb_childcare_min <- dist_hdb_childcare_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X1545)))
PROX_CHILDCARE <- dist_hdb_childcare_min$Min/1000
The following code chunk is similar to 4.9.5
dist_hdb_childcare_df$num_within_350m <- rowSums(dist_hdb_childcare_df <= 350)
NUM_CHILDCARE_350M <- dist_hdb_childcare_df$num_within_350m
The following code chunk is similar to 4.9.6
hdb_updated <- cbind(hdb_updated, PROX_CHILDCARE, NUM_CHILDCARE_350M)
The following code chunk is similar to 4.4.6
write.csv(hdb_updated, "data/newaspatial/hdbupdated8.csv", row.names = FALSE)
The following code chunk imports busstop shapefile.
busstop_sf <- st_read(dsn = "data/independentvar/extracted", layer="BusStop")
There are 5156 rows of eldercare. The projected CRS is already in SVY21.
The following code chunk is similar to 4.4.2
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
busstop_df <- busstop_sf$geometry
busstop <- st_coordinates(busstop_df)
busstop <- busstop[,c(1,2)]
dist_hdb_busstop <- spDists(hdb, busstop, longlat=FALSE)
The following code chunk is similar to 4.4.3
write.csv(dist_hdb_busstop, "data/newaspatial/hdbtobusstop.csv", row.names = FALSE)
The following code chunk is similar to 4.4.4
dist_hdb_busstop_df <- data.frame(dist_hdb_busstop)
dist_hdb_busstop_min <- dist_hdb_busstop_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X5156)))
PROX_BUSSTOP <- dist_hdb_busstop_min$Min/1000
The following code chunk is similar to 4.9.5
dist_hdb_busstop_df$num_within_350m <- rowSums(dist_hdb_busstop_df <= 350)
NUM_BUSSTOP_350M <- dist_hdb_busstop_df$num_within_350m
The following code chunk is similar to 4.9.6
hdb_updated <- cbind(hdb_updated, PROX_BUSSTOP, NUM_BUSSTOP_350M)
The following code chunk is similar to 4.4.6
write.csv(hdb_updated, "data/newaspatial/hdbupdated9.csv", row.names = FALSE)
The following code chunk reads the csv file of school using read_csv function of readr package.
schools_df <- read_csv("data/independentvar/extracted/Singapore+Schools-2021-10-21.csv")
There are 367 rows of schools data.
The following code chunk filters rows that contains PRIMRY SCHOOL in the school name using grepl function of base
The following code chunk is similar to 4.1.2
for (i in 1:nrow(primary_schools_df)) {
postalcode <- primary_schools_df[i,'Zip/Postal Code']
url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', postalcode, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
encoded_url <- URLencode(url)
getcoordinate <- GET(encoded_url)
jsonParsed <- content(getcoordinate,as="parsed")
if (length(jsonParsed$results) > 0) {
primary_schools_df[i,'LATITUDE'] = jsonParsed$results[[1]]$LATITUDE
primary_schools_df[i,'LONGITUDE'] = jsonParsed$results[[1]]$LONGITUDE
}
}
There are three rows of schools with missing NA.
The following code chunk assign lat long manually (searched from google) to the rows with missing LATITUDE and LONGITUDE
primary_schools_df[8,'LATITUDE'] = '1.2753515360141332'
primary_schools_df[8,'LONGITUDE'] = '103.83994593723308'
primary_schools_df[37,'LATITUDE'] = '1.2751443823249178'
primary_schools_df[37,'LONGITUDE'] = '103.82391208407438'
primary_schools_df[100,'LATITUDE'] = '1.3252154625658445'
primary_schools_df[100,'LONGITUDE'] = '103.88172874174562'
The following code chunk only selects necessary column.
primary_schools_df_selected <- primary_schools_df %>% select('Name', 'Address', 'Zip/Postal Code', 'LATITUDE', 'LONGITUDE')
There are 156 rows of primary school after filtering.
write.csv(primary_schools_df_selected, "data/newaspatial/primaryschoollatlong.csv", row.names = FALSE)
The following code chunk converts the dataset with latlong decimal degree to metres coordinates, and transform the crs to Singapore’s projection, 3414.
primary_schools_sf <- st_as_sf(primary_schools_df_selected,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
The following code chunk is similar to 4.4.2
hdb_updated_geometry <- st_as_sf(hdb_updated,
coords = c("LONGITUDE",
"LATITUDE"),
crs=4326) %>%
st_transform(crs = 3414)
hdb_df <- hdb_updated_geometry$geometry
hdb <- st_coordinates(hdb_df)
prisch_df <- primary_schools_sf$geometry
prisch <- st_coordinates(prisch_df)
prisch <- prisch[,c(1,2)]
dist_hdb_prisch <- spDists(hdb, prisch, longlat=FALSE)
write.csv(dist_hdb_prisch, "data/newaspatial/hdbtoprisch.csv", row.names = FALSE)
The following code chunk is similar to 4.4.4
dist_hdb_prisch_df <- data.frame(dist_hdb_prisch)
dist_hdb_prisch_min <- dist_hdb_prisch_df %>% rowwise() %>% mutate(Min = min(c_across(X1:X156)))
PROX_PRISCH <- dist_hdb_prisch_min$Min/1000
The following code chunk is similar to 4.9.5. The only difference is that we are counting the number of point below 1km.
dist_hdb_prisch_df$num_within_1km <- rowSums(dist_hdb_prisch_df <= 1000)
NUM_PRISCH_1KM <- dist_hdb_prisch_df$num_within_1km
The following code chunk is similar to 4.9.6
hdb_updated <- cbind(hdb_updated, PROX_PRISCH, NUM_PRISCH_1KM)
write.csv(hdb_updated, "data/newaspatial/hdbupdated10.csv", row.names = FALSE)
The following code chunk creates dummy variable for storey_range column using dummy function of dummies package.
hdb_updated1 <- cbind(hdb_updated, dummy(hdb_updated$storey_range, sep = "_"))
The following code chunk replace the dummy’s column name from “hdb_updated_01 TO 03” format to “storey_01TO03” format using gsub function of base package.
The following code chunk then select all the require column in hdb_final using select function
hdb_final <- hdb_updated1 %>%
select(address, storey_range, LATITUDE, LONGITUDE, resale_price, floor_area_sqm, remaining_lease_year, PROX_CBD, PROX_ELDERLYCARE, PROX_HAWKER, PROX_MRTLRT, PROX_PARKS, PROX_SUPERMARKETS, PROX_CLINICS, NUM_KINDERGARDEN_350M, NUM_CHILDCARE_350M, NUM_BUSSTOP_350M, NUM_PRISCH_1KM, `storey_01TO03`, `storey_04TO06`, `storey_07TO09`, `storey_10TO12`, `storey_13TO15`, `storey_16TO18`, `storey_19TO21`, `storey_22TO24`, `storey_25TO27`, `storey_28TO30`, `storey_31TO33`, `storey_34TO36`, `storey_37TO39`, `storey_40TO42`, `storey_43TO45`, `storey_46TO48`, `storey_49TO51`)
The following code chunk save hdb_final as csv using write.csv function
write.csv(hdb_final, "data/aspatial/hdb_final.csv", row.names = FALSE)
The following code chunk import MP14_SUBZONE_WEB_PL using st_read function of sf package.
mpsz = st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\kwekyichen\IS415_blog\_posts\2021-10-27-takehome-ex03\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
The following code chunk update the EPSG code 3414 using st_transform of sf package and retrieve coordinate using st_crs of sf package
mpsz_svy21 <- st_transform(mpsz, 3414)
st_crs(mpsz_svy21)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
We can see that the EPSG is 3414 which is correct.
The following code chunk read hdb_final csv file as tibble data frame using read_csv function and glimpse to display the data
hdb_final <- read_csv("data/aspatial/hdb_final.csv")
glimpse(hdb_final)
Rows: 15,901
Columns: 35
$ address <chr> "8 ST. GEORGE'S LANE", "1 ST. GEORGE'S~
$ storey_range <chr> "01 TO 03", "04 TO 06", "01 TO 03", "0~
$ LATITUDE <dbl> 1.322127, 1.322143, 1.322102, 1.322123~
$ LONGITUDE <dbl> 103.8598, 103.8605, 103.8621, 103.8622~
$ resale_price <dbl> 435000, 370000, 440000, 470000, 488000~
$ floor_area_sqm <dbl> 93, 82, 91, 93, 104, 93, 93, 104, 104,~
$ remaining_lease_year <dbl> 62, 56, 64, 64, 64, 64, 60, 64, 76, 60~
$ PROX_CBD <dbl> 7.047, 7.053, 6.795, 6.795, 5.879, 7.0~
$ PROX_ELDERLYCARE <dbl> 0.4742022, 0.4081859, 0.2889761, 0.287~
$ PROX_HAWKER <dbl> 0.4852265, 0.4262707, 0.3345345, 0.334~
$ PROX_MRTLRT <dbl> 0.3704969, 0.3300307, 0.3029719, 0.306~
$ PROX_PARKS <dbl> 0.7013871, 0.6186203, 0.4548956, 0.446~
$ PROX_SUPERMARKETS <dbl> 0.06992773, 0.13996918, 0.30870224, 0.~
$ PROX_CLINICS <dbl> 0.15861481, 0.08108943, 0.12433439, 0.~
$ NUM_KINDERGARDEN_350M <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1,~
$ NUM_CHILDCARE_350M <dbl> 4, 4, 5, 5, 4, 4, 4, 4, 4, 4, 6, 5, 4,~
$ NUM_BUSSTOP_350M <dbl> 4, 5, 5, 5, 3, 5, 5, 5, 3, 5, 1, 5, 5,~
$ NUM_PRISCH_1KM <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
$ storey_01TO03 <dbl> 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,~
$ storey_04TO06 <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1,~
$ storey_07TO09 <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0,~
$ storey_10TO12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_13TO15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_16TO18 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_19TO21 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_22TO24 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_25TO27 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_28TO30 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_31TO33 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_34TO36 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_37TO39 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_40TO42 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_43TO45 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_46TO48 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ storey_49TO51 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
There are 15901 rows of data and 35 columns as shown above.
The following code chunk Checks if there are any missing values in the data using sum and is.na function of base package
There are no missing values in the dataset.
The following code chunk converts hdb_final dataframe into sf dataframe using st_as_sf function of sf package. The coordinate is then converted into svy21 EPSG:3414 FROM WGS84 using st_transform function.
Simple feature collection with 6 features and 33 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 30853.96 ymin: 33816.31 xmax: 31212.98 ymax: 33821.39
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 x 34
address storey_range resale_price floor_area_sqm remaining_lease~
<chr> <chr> <dbl> <dbl> <dbl>
1 8 ST. GEORGE'S LANE 01 TO 03 435000 93 62
2 1 ST. GEORGE'S RD 04 TO 06 370000 82 56
3 23 ST. GEORGE'S RD 01 TO 03 440000 91 64
4 14 ST. GEORGE'S RD 07 TO 09 470000 93 64
5 10 ST. GEORGE'S RD 01 TO 03 488000 104 64
6 20 ST. GEORGE'S RD 01 TO 03 385000 93 64
# ... with 29 more variables: PROX_CBD <dbl>, PROX_ELDERLYCARE <dbl>,
# PROX_HAWKER <dbl>, PROX_MRTLRT <dbl>, PROX_PARKS <dbl>,
# PROX_SUPERMARKETS <dbl>, PROX_CLINICS <dbl>,
# NUM_KINDERGARDEN_350M <dbl>, NUM_CHILDCARE_350M <dbl>,
# NUM_BUSSTOP_350M <dbl>, NUM_PRISCH_1KM <dbl>,
# storey_01TO03 <dbl>, storey_04TO06 <dbl>, storey_07TO09 <dbl>,
# storey_10TO12 <dbl>, storey_13TO15 <dbl>, ...
The following code chunk plots the distribution of resale_price using ggplot function of ggplot2 package.
ggplot(data=hdb_final_sf, aes(x= `resale_price`)) +
geom_histogram(bins=20, color="black", fill="light blue")

From the above, we can see a normal distribution. Majority of the four room hdb were transacted for about $300000 to 400000.
Next, we are interested to see the distribution of other variables.
The following code chunk plots multiple distribution of other variables using ggplot function of ggplot2 package and arrange them in columns of 3 using ggarrange function of ggpubr package. Note that NUM_KINDERGARTENS_350M, NUM_CHILDCARE_350M, NUM_BUSSTOPS_350M and NUM_PRISCH_1KM are not plotted because the data is in count.
floor_area_sqm <- ggplot(data=hdb_final_sf, aes(x= `floor_area_sqm`)) +
geom_histogram(bins=20, color="black", fill="light blue")
remaining_lease_year <- ggplot(data=hdb_final_sf, aes(x= `remaining_lease_year`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_CBD <- ggplot(data=hdb_final_sf, aes(x= `PROX_CBD`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_ELDERLYCARE <- ggplot(data=hdb_final_sf, aes(x= `PROX_ELDERLYCARE`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_HAWKER <- ggplot(data=hdb_final_sf, aes(x= `PROX_HAWKER`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_MRTLRT <- ggplot(data=hdb_final_sf, aes(x= `PROX_MRTLRT`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_PARKS <- ggplot(data=hdb_final_sf, aes(x= `PROX_PARKS`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_SUPERMARKETS <- ggplot(data=hdb_final_sf, aes(x= `PROX_SUPERMARKETS`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_CLINICS <- ggplot(data=hdb_final_sf, aes(x= `PROX_CLINICS`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(floor_area_sqm, remaining_lease_year, PROX_CBD, PROX_ELDERLYCARE, PROX_HAWKER, PROX_MRTLRT, PROX_PARKS, PROX_SUPERMARKETS, PROX_CLINICS, ncol = 3, nrow = 3)

From the above, we can see that most of variables are normally distributed. The remaining lease year seems to be normally distributed but the only exception is that it has a high count of transacted four-room hdb with remaining_lease_year of above 90 years. There is no need for any transformation since the variables are normally distributed.
Resale Price
The following code chunk:
tmap_mode("view")
tm_shape(mpsz_svy21)+
tm_polygons()+
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf) +
tm_dots(col = "resale_price",
alpha = 0.6,
style="quantile") +
tm_view(set.zoom.limits = c(10,15))
tmap_mode("plot")
From the interactive map, we can see that hdb in central and south area tends to have higher resale price (darker red points) while hdb in the west have lower resale price as seen that there are more light yellow dots in the west.
To avoid multicolinearity, we need to ensure that the independent variables are not highly correlated before we build a multiple regression model.
The following code chunk:
corrplot(cor(hdb_final[, 6:35]), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.7, number.cex=0.5, method = "number", type = "upper")

From the scatterplot above, we can see that the independent variable are not highly correlated to each other. Hence, we can include all of them in the subsequent model building.
The following code chunk builds a multiple linear regression model using lm function of stats package.
Note that storey49TO51 dummy variable is excluded since because we should only have n-1 dummy variable. It will have a strong correlation with other independent variable if we were to include it.
hdb_final_mlr <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + PROX_CLINICS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM + storey_01TO03 + storey_04TO06 + storey_07TO09 + storey_10TO12 + storey_13TO15 + storey_16TO18 + storey_19TO21 + storey_22TO24 + storey_25TO27 + storey_28TO30 + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42 + storey_43TO45 + storey_46TO48, data=hdb_final_sf)
summary(hdb_final_mlr)
Call:
lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year +
PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT +
PROX_PARKS + PROX_SUPERMARKETS + PROX_CLINICS + NUM_KINDERGARDEN_350M +
NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM +
storey_01TO03 + storey_04TO06 + storey_07TO09 + storey_10TO12 +
storey_13TO15 + storey_16TO18 + storey_19TO21 + storey_22TO24 +
storey_25TO27 + storey_28TO30 + storey_31TO33 + storey_34TO36 +
storey_37TO39 + storey_40TO42 + storey_43TO45 + storey_46TO48,
data = hdb_final_sf)
Residuals:
Min 1Q Median 3Q Max
-206792 -43837 -4773 38076 481414
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 552176.55 45944.42 12.018 < 2e-16 ***
floor_area_sqm 2846.40 75.82 37.543 < 2e-16 ***
remaining_lease_year 3951.51 45.91 86.066 < 2e-16 ***
PROX_CBD -12045.92 131.23 -91.795 < 2e-16 ***
PROX_ELDERLYCARE -14597.19 824.84 -17.697 < 2e-16 ***
PROX_HAWKER -20294.68 1091.64 -18.591 < 2e-16 ***
PROX_MRTLRT -29591.48 1463.09 -20.225 < 2e-16 ***
PROX_PARKS -5484.54 1238.29 -4.429 9.52e-06 ***
PROX_SUPERMARKETS -33703.18 3709.69 -9.085 < 2e-16 ***
PROX_CLINICS -4357.53 5398.76 -0.807 0.419601
NUM_KINDERGARDEN_350M 6904.94 531.68 12.987 < 2e-16 ***
NUM_CHILDCARE_350M -5115.76 289.59 -17.665 < 2e-16 ***
NUM_BUSSTOP_350M 783.85 183.74 4.266 2.00e-05 ***
NUM_PRISCH_1KM -1441.23 435.29 -3.311 0.000932 ***
storey_01TO03 -445761.68 45118.86 -9.880 < 2e-16 ***
storey_04TO06 -426566.67 45113.33 -9.455 < 2e-16 ***
storey_07TO09 -412429.94 45113.33 -9.142 < 2e-16 ***
storey_10TO12 -404137.43 45114.48 -8.958 < 2e-16 ***
storey_13TO15 -402869.35 45122.64 -8.928 < 2e-16 ***
storey_16TO18 -395007.62 45144.40 -8.750 < 2e-16 ***
storey_19TO21 -363971.04 45234.37 -8.046 9.13e-16 ***
storey_22TO24 -364932.98 45297.31 -8.056 8.41e-16 ***
storey_25TO27 -323362.22 45438.00 -7.117 1.15e-12 ***
storey_28TO30 -249803.07 45582.74 -5.480 4.31e-08 ***
storey_31TO33 -234063.59 46310.40 -5.054 4.37e-07 ***
storey_34TO36 -239785.16 46553.05 -5.151 2.62e-07 ***
storey_37TO39 -173581.77 46456.27 -3.736 0.000187 ***
storey_40TO42 -135850.95 47072.91 -2.886 0.003907 **
storey_43TO45 -92521.17 58167.26 -1.591 0.111718
storey_46TO48 -66834.46 58162.92 -1.149 0.250536
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 63710 on 15871 degrees of freedom
Multiple R-squared: 0.7192, Adjusted R-squared: 0.7187
F-statistic: 1401 on 29 and 15871 DF, p-value: < 2.2e-16
From the above, we can see that there are some independent variables that are statistically not significant. Variables such as PROX_CLINICS, storey_43TO45 and storey_46TO48 have p-value above 0.05 and are not statistically significant. We will exclude these variable when we calibrate the revised model.
The following code chunk:
hdb_final_mlr1 <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM + storey_01TO03 + storey_04TO06 + storey_07TO09 + storey_10TO12 + storey_13TO15 + storey_16TO18 + storey_19TO21 + storey_22TO24 + storey_25TO27 + storey_28TO30 + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sf)
ols_regress(hdb_final_mlr1)
Model Summary
----------------------------------------------------------------------
R 0.848 RMSE 63714.701
R-Squared 0.719 Coef. Var 14.695
Adj. R-Squared 0.719 MSE 4059563136.315
Pred R-Squared 0.718 MAE 49921.903
----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------------
Regression 1.649773e+14 26 6.345282e+12 1563.046 0.0000
Residual 6.444151e+13 15874 4059563136.315
Total 2.294188e+14 15900
--------------------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------------------------
(Intercept) 492471.796 24238.793 20.318 0.000 444961.012 539982.580
floor_area_sqm 2841.525 75.530 0.168 37.621 0.000 2693.478 2989.573
remaining_lease_year 3954.416 45.761 0.423 86.415 0.000 3864.719 4044.112
PROX_CBD -12042.669 131.166 -0.571 -91.812 0.000 -12299.770 -11785.568
PROX_ELDERLYCARE -14661.764 820.820 -0.081 -17.862 0.000 -16270.665 -13052.863
PROX_HAWKER -20468.605 1071.005 -0.088 -19.112 0.000 -22567.896 -18369.314
PROX_MRTLRT -29671.643 1459.634 -0.096 -20.328 0.000 -32532.692 -26810.594
PROX_PARKS -5518.565 1237.964 -0.021 -4.458 0.000 -7945.114 -3092.015
PROX_SUPERMARKETS -34816.859 3440.566 -0.046 -10.120 0.000 -41560.759 -28072.960
NUM_KINDERGARDEN_350M 6906.444 531.636 0.058 12.991 0.000 5864.376 7948.512
NUM_CHILDCARE_350M -5102.534 289.316 -0.084 -17.637 0.000 -5669.625 -4535.442
NUM_BUSSTOP_350M 789.801 183.597 0.019 4.302 0.000 429.929 1149.673
NUM_PRISCH_1KM -1463.671 434.420 -0.019 -3.369 0.001 -2315.183 -612.159
storey_01TO03 -386171.260 22650.294 -1.221 -17.049 0.000 -430568.405 -341774.115
storey_04TO06 -366967.637 22639.285 -1.286 -16.209 0.000 -411343.205 -322592.069
storey_07TO09 -352833.427 22639.223 -1.191 -15.585 0.000 -397208.873 -308457.981
storey_10TO12 -344549.476 22641.483 -1.114 -15.218 0.000 -388929.352 -300169.601
storey_13TO15 -343277.945 22656.961 -0.853 -15.151 0.000 -387688.159 -298867.732
storey_16TO18 -335436.258 22700.335 -0.612 -14.777 0.000 -379931.490 -290941.027
storey_19TO21 -304452.078 22877.920 -0.344 -13.308 0.000 -349295.395 -259608.760
storey_22TO24 -305371.725 23002.292 -0.290 -13.276 0.000 -350458.826 -260284.624
storey_25TO27 -263781.805 23279.414 -0.193 -11.331 0.000 -309412.097 -218151.514
storey_28TO30 -190281.671 23557.597 -0.118 -8.077 0.000 -236457.234 -144106.107
storey_31TO33 -174526.499 24934.671 -0.069 -6.999 0.000 -223401.283 -125651.715
storey_34TO36 -180255.027 25383.025 -0.065 -7.101 0.000 -230008.635 -130501.419
storey_37TO39 -114023.613 25207.157 -0.043 -4.523 0.000 -163432.500 -64614.727
storey_40TO42 -76306.689 26324.957 -0.024 -2.899 0.004 -127906.592 -24706.786
------------------------------------------------------------------------------------------------------------------
From the above, the adjusted R-squared for the revised multiple linear regression model is 0.719.
The following code chunk test for sign of multicolinearity using ols_vif_tol function of olsrr package.
ols_vif_tol(hdb_final_mlr1)
There are some independent variables with VIF value above 10 which means that there are sign of multicollinearity. Hence we would want to calibrate the model again by removing independent variables with VIF above 10. Independent variables to be removed includes storey_01TO03, storey_04TO06, storey_07TO09, storey_10TO12, storey_13TO15, storey_16TO18, storey_19TO21, storey_22TO24, storey_25TO27 and storey_28TO30.
Revise the model
The following code chunk:
hdb_final_mlr2 <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M + NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sf)
Checks for multicolinearity again
ols_vif_tol(hdb_final_mlr2)
Variables Tolerance VIF
1 floor_area_sqm 0.8922755 1.120730
2 remaining_lease_year 0.8230281 1.215025
3 PROX_CBD 0.4680141 2.136688
4 PROX_ELDERLYCARE 0.8642598 1.157060
5 PROX_HAWKER 0.8407733 1.189381
6 PROX_MRTLRT 0.8069432 1.239245
7 PROX_PARKS 0.8287724 1.206604
8 PROX_SUPERMARKETS 0.8720835 1.146679
9 NUM_KINDERGARDEN_350M 0.8840377 1.131173
10 NUM_CHILDCARE_350M 0.7870752 1.270527
11 NUM_BUSSTOP_350M 0.8925807 1.120347
12 NUM_PRISCH_1KM 0.5821821 1.717676
13 storey_31TO33 0.9867316 1.013447
14 storey_34TO36 0.9874545 1.012705
15 storey_37TO39 0.9851150 1.015110
16 storey_40TO42 0.9902039 1.009893
From the above, We can conclude that there are no sign of multicollinearity among the independent variables since the VIF value are below 10.
Ordinary least squares regression
The following code chunk run ols_regress function of olsrr package for the new model hdb_final_mlr2
ols_regress(hdb_final_mlr2)
Model Summary
----------------------------------------------------------------------
R 0.825 RMSE 67836.494
R-Squared 0.681 Coef. Var 15.645
Adj. R-Squared 0.681 MSE 4601789951.266
Pred R-Squared 0.681 MAE 52876.816
----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------------
Regression 1.56324e+14 16 9.770251e+12 2123.141 0.0000
Residual 7.309483e+13 15884 4601789951.266
Total 2.294188e+14 15900
--------------------------------------------------------------------------------
Parameter Estimates
---------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
---------------------------------------------------------------------------------------------------------------
(Intercept) 130966.555 8992.671 14.564 0.000 113339.900 148593.210
floor_area_sqm 2696.456 80.210 0.159 33.617 0.000 2539.235 2853.677
remaining_lease_year 4457.430 46.172 0.477 96.539 0.000 4366.926 4547.933
PROX_CBD -12837.514 138.113 -0.609 -92.949 0.000 -13108.231 -12566.796
PROX_ELDERLYCARE -14858.147 873.307 -0.082 -17.014 0.000 -16569.927 -13146.367
PROX_HAWKER -23114.663 1136.928 -0.099 -20.331 0.000 -25343.170 -20886.156
PROX_MRTLRT -32911.196 1546.944 -0.106 -21.275 0.000 -35943.383 -29879.010
PROX_PARKS -4179.392 1317.251 -0.016 -3.173 0.002 -6761.354 -1597.431
PROX_SUPERMARKETS -32735.045 3661.137 -0.043 -8.941 0.000 -39911.288 -25558.801
NUM_KINDERGARDEN_350M 7183.877 565.379 0.061 12.706 0.000 6075.669 8292.084
NUM_CHILDCARE_350M -5197.532 307.576 -0.085 -16.898 0.000 -5800.416 -4594.649
NUM_BUSSTOP_350M 951.826 195.226 0.023 4.876 0.000 569.161 1334.491
NUM_PRISCH_1KM -2707.816 459.614 -0.035 -5.891 0.000 -3608.712 -1806.920
storey_31TO33 162800.059 11394.750 0.064 14.287 0.000 140465.057 185135.061
storey_34TO36 156339.507 12475.395 0.056 12.532 0.000 131886.319 180792.695
storey_37TO39 221860.148 12094.346 0.083 18.344 0.000 198153.859 245566.437
storey_40TO42 259042.577 14544.218 0.080 17.811 0.000 230534.261 287550.892
---------------------------------------------------------------------------------------------------------------
From the above, the adjusted R-squared for the revised multiple linear regression model is 0.681, lower than the initial revised model’s adjusted R-square value of 0.719.
The following code chunk checks the linear relationsihp assumption using ols_plot_resid_fit function of olsrr.
ols_plot_resid_fit(hdb_final_mlr2)

From the above, most of the data point are scattered near the 0 line. Around the middle area with fitted value 0.00006, there are slightly more data points with residual above 0. In general, we can conclude that the relationships between the dependent variable and independent variables are linear.
The following code chunk checks normality assumption using ols_plot_resid_hist function of oslrr package
ols_plot_resid_hist(hdb_final_mlr2)

From the above, we can see that the residual of the revised multiple linear regression model resembles a normal distribution.
To perform spatial autocorrelation test, we need to convert hdb_final_sf simple into a SpatialPointsDataFrame.
Export residuals
The following code chunk export the residuals of the multiple linear regression model we have build and save it as data frame
mlr.output <- as.data.frame(hdb_final_mlr2$residuals)
Combine data frame
The following code chunk combines the residual output data frame with hdb_final_sf data frame using cbind function of base package, and rename the residual column to MLR_RES.
hdb_final_res_sf <- cbind(hdb_final_sf, mlr.output) %>%
rename(`MLR_RES` = `hdb_final_mlr2.residuals`)
convert hdb_final_res_sf into SpatialPointsDataFrame
The following code chunk converts hdb_final_res_sf into SpatialPointsDataFrame using as_Spatial function of sf package.
hdb_final_sp <- as_Spatial(hdb_final_res_sf)
hdb_final_sp
class : SpatialPointsDataFrame
features : 15901
extent : 11593.88, 42621.21, 28217.39, 48741.03 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 34
names : address, storey_range, resale_price, floor_area_sqm, remaining_lease_year, PROX_CBD, PROX_ELDERLYCARE, PROX_HAWKER, PROX_MRTLRT, PROX_PARKS, PROX_SUPERMARKETS, PROX_CLINICS, NUM_KINDERGARDEN_350M, NUM_CHILDCARE_350M, NUM_BUSSTOP_350M, ...
min values : 1 CHAI CHEE RD, 01 TO 03, 218000, 74, 45, 1.813, 3.348979e-07, 0.03333586, 0.02204073, 0.04416432, 1.648702e-08, 4.555479e-09, 0, 0, 0, ...
max values : 9B BOON TIONG RD, 49 TO 51, 1186888, 138, 97, 29.349, 3.301637, 2.86763, 2.130606, 2.413137, 1.571317, 0.8083327, 7, 20, 19, ...
Distribution of residuals
The following code chunk plots the distribution of residuals on an interactive map with mpsz_svy21 layer.
tmap_mode("view")
tm_shape(mpsz_svy21)+
tm_polygons()+
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_res_sf) +
tm_dots(col = "MLR_RES",
alpha = 0.6,
style="quantile") +
tm_view(set.zoom.limits = c(10,15))
tmap_mode("plot")
From the above interactive plot, we can see that the residuals are not randomly distributed as there are many of the same color in some area. This means that there is a sign of spatial autocorrelation.
Moran’s I test
We can perform Moran’s I test to check if there is sign of spatial autocorrelation.
Compute distance-based weight matrix
The following code chunk first compute the distance-based weight matrix by using dnearneigh function of spdep. The lower distance bound is set to 0m, upper distance bound set to 1500m, and longlat set to FALSE which means euclidean distance.
nb <- dnearneigh(coordinates(hdb_final_sp), 0, 1500, longlat = FALSE)
summary(nb)
Neighbour list object:
Number of regions: 15901
Number of nonzero links: 10214420
Percentage nonzero weights: 4.039846
Average number of links: 642.376
Link number distribution:
2 14 25 31 46 56 58 61 63 65 68 70 71 72
3 14 20 6 47 4 4 5 1 2 4 1 12 1
79 86 87 88 89 90 91 96 97 99 101 105 106 109
78 2 1 5 4 3 1 2 8 7 4 2 2 2
110 111 112 115 117 118 119 120 121 122 123 124 125 126
5 4 3 1 1 11 4 2 16 5 22 8 50 22
128 129 130 131 132 133 134 135 136 137 138 139 140 141
29 6 2 16 6 8 2 5 6 11 3 25 8 5
142 143 144 145 147 148 149 150 151 152 153 154 156 157
7 2 16 11 15 4 4 3 9 22 15 14 6 6
158 159 160 161 162 163 165 166 167 168 170 171 173 175
9 3 7 14 5 2 27 5 12 5 3 15 11 6
176 177 178 179 180 181 182 183 184 185 186 187 188 189
14 8 8 35 16 7 11 19 15 2 30 15 10 8
190 191 192 193 194 195 196 197 198 199 200 201 202 203
6 19 3 2 3 16 6 15 21 16 14 15 23 1
204 205 206 207 208 209 210 211 212 213 214 215 216 217
15 19 8 4 25 1 6 4 8 10 4 14 9 2
218 219 220 221 222 223 224 225 226 227 228 229 230 231
4 12 27 30 12 5 33 6 5 19 25 5 10 9
232 233 234 235 236 237 238 239 240 241 242 243 244 245
4 11 15 43 18 2 13 7 28 28 9 10 13 11
246 247 248 249 250 251 252 253 254 255 256 257 258 259
7 10 10 19 18 13 5 19 22 15 8 15 16 6
260 261 262 263 264 265 266 267 268 269 270 271 272 273
4 18 23 3 24 12 3 21 26 6 1 26 34 64
274 275 276 277 278 279 280 281 282 283 284 285 286 287
14 30 18 6 65 6 53 29 30 19 51 24 17 48
288 289 290 291 292 293 294 295 296 297 298 299 300 301
55 33 29 34 66 39 16 42 5 16 28 32 25 16
302 303 304 305 306 307 308 309 310 311 312 313 314 315
37 28 21 22 24 10 13 9 8 15 5 35 10 16
316 317 318 319 320 321 322 323 324 325 326 327 328 329
22 9 49 39 34 11 14 41 28 28 35 17 21 28
330 331 332 333 334 335 336 337 338 339 340 341 342 343
18 29 12 35 38 26 3 25 40 18 15 37 12 24
344 345 346 347 348 349 350 351 352 353 354 355 356 357
15 14 11 42 5 39 26 16 25 12 36 21 31 13
358 359 360 361 362 363 364 365 366 367 368 369 370 371
26 20 20 40 7 31 36 25 17 24 17 23 16 16
372 373 374 375 376 377 378 379 380 381 382 383 384 385
26 18 49 19 31 35 10 24 12 9 14 31 5 41
386 387 388 389 390 391 392 393 394 395 396 397 398 399
29 12 31 13 54 9 20 11 13 25 45 26 15 10
400 401 402 403 404 405 406 407 408 410 411 412 413 414
31 23 16 14 29 8 17 25 24 22 25 22 21 10
415 416 417 418 419 420 421 422 423 424 425 426 427 428
26 12 13 19 72 17 19 16 45 12 17 13 45 21
429 430 431 432 433 434 435 436 437 438 439 440 441 442
18 36 32 16 29 21 18 34 10 29 27 20 28 9
443 444 445 446 447 448 449 450 451 452 453 454 455 456
8 11 33 12 12 24 28 16 18 31 17 28 18 9
457 458 459 460 461 462 463 464 465 466 467 468 469 470
16 47 34 15 17 12 13 23 15 7 14 22 8 27
471 472 473 474 475 476 477 478 479 480 481 482 483 484
11 8 16 42 9 24 14 29 27 32 21 8 12 12
485 486 487 488 489 490 491 492 493 494 495 496 497 498
31 8 10 23 6 28 9 2 10 2 11 11 22 7
499 500 501 502 503 504 505 506 507 508 509 510 511 512
22 4 10 7 22 25 14 21 26 16 18 24 25 25
513 514 515 516 517 518 519 520 521 522 523 524 525 526
27 1 18 14 6 7 6 9 12 10 13 13 10 14
527 528 529 530 531 532 533 534 535 536 537 538 539 540
10 14 53 11 19 18 9 8 11 13 9 8 4 4
541 542 543 544 545 546 547 548 549 550 551 552 553 554
5 3 19 20 21 10 8 7 6 10 13 8 17 2
555 556 557 558 559 560 561 562 563 564 565 566 567 568
6 5 10 23 15 21 10 5 13 12 21 22 27 24
569 570 571 572 573 574 575 576 577 578 579 580 581 582
9 12 24 15 22 7 17 25 7 27 21 16 5 8
583 584 585 586 587 588 589 590 591 592 593 594 595 596
6 21 4 12 9 7 17 27 15 26 13 12 15 22
597 598 599 600 601 602 603 604 605 606 607 608 609 610
18 9 8 20 10 16 6 9 15 6 20 4 13 11
611 612 613 614 615 616 618 619 620 621 622 623 624 625
17 23 14 6 14 29 30 6 7 16 3 2 21 3
626 627 628 629 630 631 633 634 635 636 637 638 639 640
13 22 14 13 6 7 7 10 5 17 21 20 4 20
641 642 643 644 645 646 647 648 649 650 651 652 653 654
16 20 5 10 17 12 33 6 1 27 7 6 11 1
655 656 657 658 659 660 662 663 664 665 666 668 669 670
9 11 9 5 17 9 4 6 13 1 2 2 31 7
671 672 673 674 675 676 677 678 679 680 681 682 683 684
18 10 16 1 11 8 6 22 9 1 17 5 6 33
685 686 687 688 689 690 691 692 693 694 695 696 697 698
6 24 9 4 5 7 7 10 20 19 9 32 1 12
699 700 701 702 703 704 705 706 707 708 709 710 711 712
17 8 13 8 3 8 13 26 9 11 9 6 4 8
713 714 715 716 717 718 719 720 721 722 723 724 725 726
5 4 31 33 8 7 8 16 24 9 14 7 4 6
727 728 729 730 731 732 733 734 735 736 737 738 739 740
5 2 2 6 9 31 4 8 3 10 1 3 2 1
741 742 744 745 746 747 748 749 750 751 752 753 754 755
8 12 16 11 14 7 28 12 27 15 23 12 2 24
756 757 758 759 760 761 762 763 764 765 766 767 768 769
11 9 2 15 7 24 9 4 6 16 8 10 9 5
770 771 772 773 774 775 776 777 778 779 780 781 782 783
16 16 3 21 14 21 8 19 14 16 13 6 29 9
784 785 786 787 788 789 790 791 792 793 794 795 796 797
16 35 3 16 26 11 10 10 12 22 17 12 3 17
798 799 800 801 802 803 804 805 806 808 809 810 811 812
17 8 3 15 24 50 15 4 20 10 3 12 43 9
813 814 815 816 817 818 819 820 821 822 823 824 825 826
42 14 9 17 10 17 6 16 14 27 16 16 7 15
827 828 829 830 831 832 833 834 835 836 837 838 839 840
14 33 12 9 12 12 7 9 9 19 17 20 4 43
841 842 843 844 845 846 847 848 849 850 851 852 853 854
14 12 3 7 18 6 11 11 6 13 17 12 18 39
855 856 857 858 859 860 861 862 863 865 866 867 868 869
12 18 2 14 13 6 15 23 4 13 21 14 10 20
870 871 872 873 874 875 876 877 878 879 880 881 882 883
23 17 14 3 4 10 5 4 5 7 8 3 19 18
884 885 886 887 888 889 890 891 892 893 894 895 896 897
12 5 6 2 10 6 3 14 15 20 4 5 7 3
899 900 901 902 903 904 905 906 907 908 909 910 911 912
16 9 16 14 9 15 7 47 5 4 6 3 23 11
913 914 915 916 917 918 919 920 921 922 923 924 925 926
23 2 8 9 1 3 9 23 4 1 14 23 4 4
927 928 929 930 931 932 933 934 937 938 939 941 942 943
7 12 2 7 4 4 6 34 5 14 1 4 1 5
944 945 946 948 950 951 952 953 954 955 956 957 961 962
13 2 17 8 5 1 2 3 24 5 6 3 2 7
964 965 967 969 970 971 974 975 976 977 978 979 980 981
3 13 17 7 29 4 2 6 8 9 21 1 20 2
982 983 984 985 986 987 990 992 993 994 995 996 998 999
18 14 1 3 12 3 4 19 7 1 5 1 1 5
1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1014
7 3 14 5 2 24 5 6 5 9 5 3 8 1
1015 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029
4 1 15 18 6 10 13 1 14 1 7 10 3 7
1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1041 1042 1043 1045
8 1 3 11 1 3 13 18 2 10 3 24 22 1
1046 1047 1048 1049 1050 1051 1053 1054 1055 1056 1060 1061 1062 1063
2 11 11 23 7 18 1 3 6 6 13 11 11 5
1065 1066 1067 1069 1070 1073 1074 1075 1079 1080 1082 1085 1086 1089
7 6 7 2 5 6 6 2 1 10 5 11 13 8
1090 1092 1093 1095 1096 1097 1098 1100 1102 1103 1104 1105 1106 1107
5 20 1 2 4 4 5 10 11 8 3 5 2 5
1109 1111 1112 1113 1115 1117 1118 1119 1120 1122 1123 1125 1127 1129
14 1 11 6 4 7 19 7 2 6 6 5 8 5
1131 1133 1135 1137 1138 1139 1140 1141 1143 1144 1145 1146 1147 1149
4 1 10 13 6 9 16 2 15 20 4 3 15 12
1150 1151 1152 1153 1154 1156 1157 1158 1159 1160 1161 1162 1164 1165
6 8 11 1 6 11 7 19 4 6 9 27 6 3
1166 1168 1169 1171 1172 1173 1175 1177 1178 1179 1180 1181 1182 1186
17 8 6 4 27 22 20 11 7 11 2 7 3 2
1189 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1202 1203 1204
20 18 3 6 16 10 18 7 4 2 15 5 10 15
1205 1206 1207 1208 1210 1212 1213 1215 1216 1217 1218 1221 1222 1223
4 2 4 10 10 5 2 12 19 3 14 13 13 11
1224 1225 1226 1227 1228 1230 1231 1232 1233 1234 1235 1236 1237 1238
3 11 22 46 30 11 7 4 1 5 35 17 10 5
1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1252 1253 1254 1255
14 4 12 2 3 8 9 1 4 1 11 2 8 2
1256 1258 1260 1263 1264 1266 1268 1270 1271 1272 1273 1275 1278 1279
7 3 2 6 8 7 2 3 6 7 12 4 3 7
1281 1283 1284 1286 1287 1288 1289 1291 1292 1293 1294 1295 1296 1297
9 11 8 8 8 3 5 3 8 12 36 1 1 4
1298 1299 1300 1302 1304 1307 1308 1309 1312 1313 1314 1315 1317 1321
2 4 1 9 6 1 6 8 9 6 21 4 10 18
1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335
6 2 12 5 2 19 2 1 8 4 1 1 3 5
1337 1338 1339 1340 1341 1345 1346 1347 1348 1349 1352 1353 1355 1356
16 15 7 9 12 18 20 19 5 5 26 13 3 5
1357 1358 1360 1361 1363 1364 1365 1368 1369 1370 1371 1372 1373 1374
14 3 4 11 7 11 11 17 5 6 1 3 8 12
1376 1377 1379 1381 1382 1383 1384 1385 1386 1387 1389 1390 1393 1394
10 3 5 7 3 26 10 5 5 17 7 9 9 10
1396 1397 1399 1400 1401 1403 1405 1406 1408 1410 1412 1414 1415 1416
7 3 8 20 6 3 14 15 2 12 6 1 4 8
1421 1422 1423 1424 1425 1426 1427 1431 1432 1435 1436 1437 1441 1443
4 1 2 10 2 3 1 2 3 1 4 18 1 1
1444 1445 1452 1454 1459 1460 1462 1463 1466 1467 1470 1472 1473 1474
5 2 10 2 1 9 1 18 1 4 1 1 7 2
1475 1476 1477 1479 1480 1486 1488 1493 1494 1495 1496 1497 1499 1508
2 1 2 1 2 8 3 6 17 3 2 8 3 1
1519 1523 1527 1533 1534 1538 1539 1543 1545 1547 1548 1549 1551 1552
2 3 2 12 1 3 4 2 6 1 2 3 2 1
1553 1558 1561 1564 1565 1567 1569 1570 1573 1574 1577 1579 1586 1587
1 3 4 2 9 3 2 7 2 4 1 2 1 1
1588 1593 1597 1598 1603 1604 1605 1607 1609 1618 1619 1624 1626 1627
8 3 6 3 1 1 1 2 7 1 2 1 2 1
1634 1638 1650 1654 1655 1656 1678 1690 1701 1708 1717 1727 1739 1743
2 16 1 3 1 3 2 1 1 5 1 1 3 1
1747 1753 1764 1771 1776 1785 1795 1796 1808 1824 1859 1875 1917 1922
6 4 3 4 1 3 4 4 5 1 3 3 1 2
1945 1971
5 4
3 least connected regions:
2914 3384 9808 with 2 links
4 most connected regions:
1901 7609 10897 12072 with 1971 links
From the above, we can see that the three least connected regions are row 2914, 3384 and 9808 with 2 links each. The most connected regions are row 1901, 7609, 10897 and 12072 with 1971 links each.
Convert output neighbour into spatial weight
The following code chunk converts nb into spatial weight using nb2listw function of spdep package.
nb_lw <- nb2listw(nb, style = 'W')
summary(nb_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 15901
Number of nonzero links: 10214420
Percentage nonzero weights: 4.039846
Average number of links: 642.376
Link number distribution:
2 14 25 31 46 56 58 61 63 65 68 70 71 72
3 14 20 6 47 4 4 5 1 2 4 1 12 1
79 86 87 88 89 90 91 96 97 99 101 105 106 109
78 2 1 5 4 3 1 2 8 7 4 2 2 2
110 111 112 115 117 118 119 120 121 122 123 124 125 126
5 4 3 1 1 11 4 2 16 5 22 8 50 22
128 129 130 131 132 133 134 135 136 137 138 139 140 141
29 6 2 16 6 8 2 5 6 11 3 25 8 5
142 143 144 145 147 148 149 150 151 152 153 154 156 157
7 2 16 11 15 4 4 3 9 22 15 14 6 6
158 159 160 161 162 163 165 166 167 168 170 171 173 175
9 3 7 14 5 2 27 5 12 5 3 15 11 6
176 177 178 179 180 181 182 183 184 185 186 187 188 189
14 8 8 35 16 7 11 19 15 2 30 15 10 8
190 191 192 193 194 195 196 197 198 199 200 201 202 203
6 19 3 2 3 16 6 15 21 16 14 15 23 1
204 205 206 207 208 209 210 211 212 213 214 215 216 217
15 19 8 4 25 1 6 4 8 10 4 14 9 2
218 219 220 221 222 223 224 225 226 227 228 229 230 231
4 12 27 30 12 5 33 6 5 19 25 5 10 9
232 233 234 235 236 237 238 239 240 241 242 243 244 245
4 11 15 43 18 2 13 7 28 28 9 10 13 11
246 247 248 249 250 251 252 253 254 255 256 257 258 259
7 10 10 19 18 13 5 19 22 15 8 15 16 6
260 261 262 263 264 265 266 267 268 269 270 271 272 273
4 18 23 3 24 12 3 21 26 6 1 26 34 64
274 275 276 277 278 279 280 281 282 283 284 285 286 287
14 30 18 6 65 6 53 29 30 19 51 24 17 48
288 289 290 291 292 293 294 295 296 297 298 299 300 301
55 33 29 34 66 39 16 42 5 16 28 32 25 16
302 303 304 305 306 307 308 309 310 311 312 313 314 315
37 28 21 22 24 10 13 9 8 15 5 35 10 16
316 317 318 319 320 321 322 323 324 325 326 327 328 329
22 9 49 39 34 11 14 41 28 28 35 17 21 28
330 331 332 333 334 335 336 337 338 339 340 341 342 343
18 29 12 35 38 26 3 25 40 18 15 37 12 24
344 345 346 347 348 349 350 351 352 353 354 355 356 357
15 14 11 42 5 39 26 16 25 12 36 21 31 13
358 359 360 361 362 363 364 365 366 367 368 369 370 371
26 20 20 40 7 31 36 25 17 24 17 23 16 16
372 373 374 375 376 377 378 379 380 381 382 383 384 385
26 18 49 19 31 35 10 24 12 9 14 31 5 41
386 387 388 389 390 391 392 393 394 395 396 397 398 399
29 12 31 13 54 9 20 11 13 25 45 26 15 10
400 401 402 403 404 405 406 407 408 410 411 412 413 414
31 23 16 14 29 8 17 25 24 22 25 22 21 10
415 416 417 418 419 420 421 422 423 424 425 426 427 428
26 12 13 19 72 17 19 16 45 12 17 13 45 21
429 430 431 432 433 434 435 436 437 438 439 440 441 442
18 36 32 16 29 21 18 34 10 29 27 20 28 9
443 444 445 446 447 448 449 450 451 452 453 454 455 456
8 11 33 12 12 24 28 16 18 31 17 28 18 9
457 458 459 460 461 462 463 464 465 466 467 468 469 470
16 47 34 15 17 12 13 23 15 7 14 22 8 27
471 472 473 474 475 476 477 478 479 480 481 482 483 484
11 8 16 42 9 24 14 29 27 32 21 8 12 12
485 486 487 488 489 490 491 492 493 494 495 496 497 498
31 8 10 23 6 28 9 2 10 2 11 11 22 7
499 500 501 502 503 504 505 506 507 508 509 510 511 512
22 4 10 7 22 25 14 21 26 16 18 24 25 25
513 514 515 516 517 518 519 520 521 522 523 524 525 526
27 1 18 14 6 7 6 9 12 10 13 13 10 14
527 528 529 530 531 532 533 534 535 536 537 538 539 540
10 14 53 11 19 18 9 8 11 13 9 8 4 4
541 542 543 544 545 546 547 548 549 550 551 552 553 554
5 3 19 20 21 10 8 7 6 10 13 8 17 2
555 556 557 558 559 560 561 562 563 564 565 566 567 568
6 5 10 23 15 21 10 5 13 12 21 22 27 24
569 570 571 572 573 574 575 576 577 578 579 580 581 582
9 12 24 15 22 7 17 25 7 27 21 16 5 8
583 584 585 586 587 588 589 590 591 592 593 594 595 596
6 21 4 12 9 7 17 27 15 26 13 12 15 22
597 598 599 600 601 602 603 604 605 606 607 608 609 610
18 9 8 20 10 16 6 9 15 6 20 4 13 11
611 612 613 614 615 616 618 619 620 621 622 623 624 625
17 23 14 6 14 29 30 6 7 16 3 2 21 3
626 627 628 629 630 631 633 634 635 636 637 638 639 640
13 22 14 13 6 7 7 10 5 17 21 20 4 20
641 642 643 644 645 646 647 648 649 650 651 652 653 654
16 20 5 10 17 12 33 6 1 27 7 6 11 1
655 656 657 658 659 660 662 663 664 665 666 668 669 670
9 11 9 5 17 9 4 6 13 1 2 2 31 7
671 672 673 674 675 676 677 678 679 680 681 682 683 684
18 10 16 1 11 8 6 22 9 1 17 5 6 33
685 686 687 688 689 690 691 692 693 694 695 696 697 698
6 24 9 4 5 7 7 10 20 19 9 32 1 12
699 700 701 702 703 704 705 706 707 708 709 710 711 712
17 8 13 8 3 8 13 26 9 11 9 6 4 8
713 714 715 716 717 718 719 720 721 722 723 724 725 726
5 4 31 33 8 7 8 16 24 9 14 7 4 6
727 728 729 730 731 732 733 734 735 736 737 738 739 740
5 2 2 6 9 31 4 8 3 10 1 3 2 1
741 742 744 745 746 747 748 749 750 751 752 753 754 755
8 12 16 11 14 7 28 12 27 15 23 12 2 24
756 757 758 759 760 761 762 763 764 765 766 767 768 769
11 9 2 15 7 24 9 4 6 16 8 10 9 5
770 771 772 773 774 775 776 777 778 779 780 781 782 783
16 16 3 21 14 21 8 19 14 16 13 6 29 9
784 785 786 787 788 789 790 791 792 793 794 795 796 797
16 35 3 16 26 11 10 10 12 22 17 12 3 17
798 799 800 801 802 803 804 805 806 808 809 810 811 812
17 8 3 15 24 50 15 4 20 10 3 12 43 9
813 814 815 816 817 818 819 820 821 822 823 824 825 826
42 14 9 17 10 17 6 16 14 27 16 16 7 15
827 828 829 830 831 832 833 834 835 836 837 838 839 840
14 33 12 9 12 12 7 9 9 19 17 20 4 43
841 842 843 844 845 846 847 848 849 850 851 852 853 854
14 12 3 7 18 6 11 11 6 13 17 12 18 39
855 856 857 858 859 860 861 862 863 865 866 867 868 869
12 18 2 14 13 6 15 23 4 13 21 14 10 20
870 871 872 873 874 875 876 877 878 879 880 881 882 883
23 17 14 3 4 10 5 4 5 7 8 3 19 18
884 885 886 887 888 889 890 891 892 893 894 895 896 897
12 5 6 2 10 6 3 14 15 20 4 5 7 3
899 900 901 902 903 904 905 906 907 908 909 910 911 912
16 9 16 14 9 15 7 47 5 4 6 3 23 11
913 914 915 916 917 918 919 920 921 922 923 924 925 926
23 2 8 9 1 3 9 23 4 1 14 23 4 4
927 928 929 930 931 932 933 934 937 938 939 941 942 943
7 12 2 7 4 4 6 34 5 14 1 4 1 5
944 945 946 948 950 951 952 953 954 955 956 957 961 962
13 2 17 8 5 1 2 3 24 5 6 3 2 7
964 965 967 969 970 971 974 975 976 977 978 979 980 981
3 13 17 7 29 4 2 6 8 9 21 1 20 2
982 983 984 985 986 987 990 992 993 994 995 996 998 999
18 14 1 3 12 3 4 19 7 1 5 1 1 5
1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1014
7 3 14 5 2 24 5 6 5 9 5 3 8 1
1015 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029
4 1 15 18 6 10 13 1 14 1 7 10 3 7
1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1041 1042 1043 1045
8 1 3 11 1 3 13 18 2 10 3 24 22 1
1046 1047 1048 1049 1050 1051 1053 1054 1055 1056 1060 1061 1062 1063
2 11 11 23 7 18 1 3 6 6 13 11 11 5
1065 1066 1067 1069 1070 1073 1074 1075 1079 1080 1082 1085 1086 1089
7 6 7 2 5 6 6 2 1 10 5 11 13 8
1090 1092 1093 1095 1096 1097 1098 1100 1102 1103 1104 1105 1106 1107
5 20 1 2 4 4 5 10 11 8 3 5 2 5
1109 1111 1112 1113 1115 1117 1118 1119 1120 1122 1123 1125 1127 1129
14 1 11 6 4 7 19 7 2 6 6 5 8 5
1131 1133 1135 1137 1138 1139 1140 1141 1143 1144 1145 1146 1147 1149
4 1 10 13 6 9 16 2 15 20 4 3 15 12
1150 1151 1152 1153 1154 1156 1157 1158 1159 1160 1161 1162 1164 1165
6 8 11 1 6 11 7 19 4 6 9 27 6 3
1166 1168 1169 1171 1172 1173 1175 1177 1178 1179 1180 1181 1182 1186
17 8 6 4 27 22 20 11 7 11 2 7 3 2
1189 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1202 1203 1204
20 18 3 6 16 10 18 7 4 2 15 5 10 15
1205 1206 1207 1208 1210 1212 1213 1215 1216 1217 1218 1221 1222 1223
4 2 4 10 10 5 2 12 19 3 14 13 13 11
1224 1225 1226 1227 1228 1230 1231 1232 1233 1234 1235 1236 1237 1238
3 11 22 46 30 11 7 4 1 5 35 17 10 5
1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1252 1253 1254 1255
14 4 12 2 3 8 9 1 4 1 11 2 8 2
1256 1258 1260 1263 1264 1266 1268 1270 1271 1272 1273 1275 1278 1279
7 3 2 6 8 7 2 3 6 7 12 4 3 7
1281 1283 1284 1286 1287 1288 1289 1291 1292 1293 1294 1295 1296 1297
9 11 8 8 8 3 5 3 8 12 36 1 1 4
1298 1299 1300 1302 1304 1307 1308 1309 1312 1313 1314 1315 1317 1321
2 4 1 9 6 1 6 8 9 6 21 4 10 18
1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335
6 2 12 5 2 19 2 1 8 4 1 1 3 5
1337 1338 1339 1340 1341 1345 1346 1347 1348 1349 1352 1353 1355 1356
16 15 7 9 12 18 20 19 5 5 26 13 3 5
1357 1358 1360 1361 1363 1364 1365 1368 1369 1370 1371 1372 1373 1374
14 3 4 11 7 11 11 17 5 6 1 3 8 12
1376 1377 1379 1381 1382 1383 1384 1385 1386 1387 1389 1390 1393 1394
10 3 5 7 3 26 10 5 5 17 7 9 9 10
1396 1397 1399 1400 1401 1403 1405 1406 1408 1410 1412 1414 1415 1416
7 3 8 20 6 3 14 15 2 12 6 1 4 8
1421 1422 1423 1424 1425 1426 1427 1431 1432 1435 1436 1437 1441 1443
4 1 2 10 2 3 1 2 3 1 4 18 1 1
1444 1445 1452 1454 1459 1460 1462 1463 1466 1467 1470 1472 1473 1474
5 2 10 2 1 9 1 18 1 4 1 1 7 2
1475 1476 1477 1479 1480 1486 1488 1493 1494 1495 1496 1497 1499 1508
2 1 2 1 2 8 3 6 17 3 2 8 3 1
1519 1523 1527 1533 1534 1538 1539 1543 1545 1547 1548 1549 1551 1552
2 3 2 12 1 3 4 2 6 1 2 3 2 1
1553 1558 1561 1564 1565 1567 1569 1570 1573 1574 1577 1579 1586 1587
1 3 4 2 9 3 2 7 2 4 1 2 1 1
1588 1593 1597 1598 1603 1604 1605 1607 1609 1618 1619 1624 1626 1627
8 3 6 3 1 1 1 2 7 1 2 1 2 1
1634 1638 1650 1654 1655 1656 1678 1690 1701 1708 1717 1727 1739 1743
2 16 1 3 1 3 2 1 1 5 1 1 3 1
1747 1753 1764 1771 1776 1785 1795 1796 1808 1824 1859 1875 1917 1922
6 4 3 4 1 3 4 4 5 1 3 3 1 2
1945 1971
5 4
3 least connected regions:
2914 3384 9808 with 2 links
4 most connected regions:
1901 7609 10897 12072 with 1971 links
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 15901 252841801 15901 79.4628 64105.37
Perform Moran’s I test
The following code chunk performs Moran’s I test using lm.morantest function of spdep package. It compares the multiple linear regression and the spatial weight.
lm.morantest(hdb_final_mlr2, nb_lw)
Global Moran I for regression residuals
data:
model: lm(formula = resale_price ~ floor_area_sqm +
remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE +
PROX_HAWKER + PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS +
NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M
+ NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 +
storey_37TO39 + storey_40TO42, data = hdb_final_sf)
weights: nb_lw
Moran I statistic standard deviate = 752.22, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I Expectation Variance
3.847662e-01 -3.532062e-04 2.621183e-07
From the above, the p-value is lower than 2.2e-16, which is lesser than alpha value of 0.05. Therefore, we reject the null hypothesis that residual are randomly distributed. In addition, the Observed Global Moran I test is 0.38 and it is greater than 0. This means that the residual resembles a clustering distribution.
In this section, we will build the Hedonic Pricing Models using Fixed and Adaptive GWR model, with the sames variable of the revised multiple linear regression model, hdb_final_mlr2.
The following code chunk determines the optimal fixed bandwidth for the model using bw.gwr function of GWmodel package.
The following are the arguements: - Approach set as Cross-validation (“CV”) - Weighting function set as Guassian - Adaptive set as False since we are computing fixed bandwidth - longlat set as False to calculate Euclidean distance
bw.fixed <- bw.gwr(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=FALSE)
The output of the above code chunk will shown in screenshot format using the code chunk below because the run time is very long.
From the above screenshot, we can see that the recommended fixed bandwidth is 350.8655 metres.
The following code chunk calibrate the gwr model using gwr.basic function of GWmodel with fixed bandwidth and gaussian kernel.
gwr.fixed <- gwr.basic(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, bw=bw.fixed, kernel="gaussian", adaptive=FALSE, longlat=FALSE)
gwr.fixed
The output of the above code chunk will shown in screenshot format using the code chunk below due to the long run time.
From the fixed bandwidth GWR model, the adjusted R-square value is 0.941, higher than the revised multiple linear regression models with adjusted R-square value of 0.681.
The following code chunk determines the optimal adaptive bandwidth for the model using bw.gwr function of GWmodel package. It is similar to 8.1.1, the only difference is that the adaptive argument is set to TRUE since we are computing adaptive bandwidth.
bw.adaptive <- bw.gwr(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, approach="CV", kernel="gaussian", adaptive=TRUE, longlat=FALSE)
The output of the above code chunk will shown in screenshot format using the code chunk below due to the long run time.
From the screenshot above, we can see that the recommended adaptive bandwidth is 209 metres.
The following code chunk calibrate the gwr model using gwr.basic function of GWmodel with adaptive bandwidth and gaussian kernel.
gwr.adaptive <- gwr.basic(formula = resale_price ~ floor_area_sqm + remaining_lease_year + PROX_CBD + PROX_ELDERLYCARE + PROX_HAWKER +
PROX_MRTLRT + PROX_PARKS + PROX_SUPERMARKETS + NUM_KINDERGARDEN_350M + NUM_CHILDCARE_350M + NUM_BUSSTOP_350M +
NUM_PRISCH_1KM + storey_31TO33 + storey_34TO36 + storey_37TO39 + storey_40TO42, data=hdb_final_sp, bw=bw.adaptive, kernel="gaussian", adaptive=TRUE, longlat=FALSE)
gwr.adaptive
The output of the above code chunk will shown in screenshot format using the code chunk below because the run time is very long.
From the adaptive bandwidth GWR model, the adjusted R-square value is 0.914, higher than the revised multiple linear regression models with adjusted R-square value of 0.681, but lower than the fixed bandwidth GWR Model with adjusted R-square value of 0.941.
Comparing Multiple Linear Regression Model, Fixed Bandwidth GWR model and GWModel Adaptive Bandwidth GWR model, Fixed Bandwidth GWR model has the best adjusted R-square of 0.941. Hence, we will be using it to visualise GWR output in the next section.
The following code chunk:
hdb_final_sf_fixed <- st_as_sf(gwr.fixed$SDF) %>%
st_transform(crs=3414)
The following code chunk display the columns in hdb_final_sf_fixed sf object using glimpse function of pillar package
glimpse(hdb_final_sf_fixed)
We will write the hdb_final_sf_fixed files into csv using st_write function of sf package and convert GEOMETRY into X and Y column using the layer_options argument
st_write(hdb_final_sf_fixed, "data/aspatial/hdb_final_sf_fixed.csv", layer_options = "GEOMETRY=AS_XY")
Import hdb_final_sf_fixed file for visualization
The following code chunk imports hdb_final_sf_fixed file using read_csv function.
hdb_final_sf_fixed <- read_csv("data/aspatial/hdb_final_sf_fixed.csv")
The following code chunk:
hdb_final_sf_fixed1 <- st_as_sf(hdb_final_sf_fixed,
coords = c("X",
"Y"),
crs=3414)
The following code chunk display the columns of hdb_final_sf_fixed1 sf object using glimpse function of pillar package
glimpse(hdb_final_sf_fixed1)
Rows: 15,901
Columns: 58
$ Intercept <dbl> 67802.83, 79647.26, 93829.81, 94708~
$ floor_area_sqm <dbl> 2011.414, 2039.657, 2081.750, 2083.~
$ remaining_lease_year <dbl> 7426.111, 7509.791, 7696.751, 7698.~
$ PROX_CBD <dbl> -27583.35, -30471.34, -35646.17, -3~
$ PROX_ELDERLYCARE <dbl> -130824.1, -122416.3, -104814.9, -1~
$ PROX_HAWKER <dbl> -49960.47, -52496.02, -50256.52, -4~
$ PROX_MRTLRT <dbl> -192957.1, -184646.8, -171212.2, -1~
$ PROX_PARKS <dbl> 121851.95, 111663.05, 89809.93, 894~
$ PROX_SUPERMARKETS <dbl> 158056.3, 139764.3, 103421.0, 10280~
$ NUM_KINDERGARDEN_350M <dbl> -1032.7275, -1178.4605, 2209.5626, ~
$ NUM_CHILDCARE_350M <dbl> -6674.598, -6097.525, -4752.532, -4~
$ NUM_BUSSTOP_350M <dbl> 2852.045, 3961.953, 6780.329, 6866.~
$ NUM_PRISCH_1KM <dbl> -25348.97, -29604.97, -41058.13, -4~
$ storey_31TO33 <dbl> 112603.13, 102415.79, 77395.72, 764~
$ storey_34TO36 <dbl> 127070.89, 116328.51, 91319.19, 904~
$ storey_37TO39 <dbl> 155193.7, 144864.5, 119865.8, 11894~
$ storey_40TO42 <dbl> 242479.7, 240244.5, 232135.0, 23221~
$ y <dbl> 435000, 370000, 440000, 470000, 488~
$ yhat <dbl> 417980.2, 372523.7, 478742.2, 48243~
$ residual <dbl> 17019.848, -2523.734, -38742.194, -~
$ CV_Score <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ Stud_residual <dbl> 0.62086976, -0.09186767, -1.4043721~
$ Intercept_SE <dbl> 57561.19, 57600.58, 59440.10, 59563~
$ floor_area_sqm_SE <dbl> 370.8196, 378.9356, 402.4416, 403.7~
$ remaining_lease_year_SE <dbl> 333.9225, 310.6055, 280.0002, 279.5~
$ PROX_CBD_SE <dbl> 5001.791, 5064.560, 5279.709, 5288.~
$ PROX_ELDERLYCARE_SE <dbl> 31581.57, 32498.18, 33719.39, 33776~
$ PROX_HAWKER_SE <dbl> 35936.79, 34671.79, 32269.50, 32150~
$ PROX_MRTLRT_SE <dbl> 21996.95, 21921.63, 22484.22, 22491~
$ PROX_PARKS_SE <dbl> 28204.54, 28314.91, 28139.99, 28112~
$ PROX_SUPERMARKETS_SE <dbl> 48331.83, 49738.75, 51183.75, 51181~
$ NUM_KINDERGARDEN_350M_SE <dbl> 5985.000, 5820.768, 5660.172, 5663.~
$ NUM_CHILDCARE_350M_SE <dbl> 3413.160, 3418.766, 3440.792, 3443.~
$ NUM_BUSSTOP_350M_SE <dbl> 2363.977, 2355.700, 2287.557, 2287.~
$ NUM_PRISCH_1KM_SE <dbl> 11550.34, 11774.78, 12058.99, 12035~
$ storey_31TO33_SE <dbl> 31783.61, 32267.39, 31542.69, 31497~
$ storey_34TO36_SE <dbl> 32448.21, 32381.62, 31521.37, 31476~
$ storey_37TO39_SE <dbl> 36378.21, 37868.71, 37598.74, 37563~
$ storey_40TO42_SE <dbl> 35037.99, 34947.49, 34783.18, 34790~
$ Intercept_TV <dbl> 1.1779260, 1.3827511, 1.5785609, 1.~
$ floor_area_sqm_TV <dbl> 5.424239, 5.382594, 5.172800, 5.161~
$ remaining_lease_year_TV <dbl> 22.23903, 24.17790, 27.48837, 27.53~
$ PROX_CBD_TV <dbl> -5.514695, -6.016582, -6.751540, -6~
$ PROX_ELDERLYCARE_TV <dbl> -4.142419, -3.766867, -3.108445, -3~
$ PROX_HAWKER_TV <dbl> -1.390232, -1.514084, -1.557400, -1~
$ PROX_MRTLRT_TV <dbl> -8.771992, -8.423043, -7.614769, -7~
$ PROX_PARKS_TV <dbl> 4.320295, 3.943614, 3.191541, 3.180~
$ PROX_SUPERMARKETS_TV <dbl> 3.270232, 2.809968, 2.020582, 2.008~
$ NUM_KINDERGARDEN_350M_TV <dbl> -0.17255263, -0.20245792, 0.3903702~
$ NUM_CHILDCARE_350M_TV <dbl> -1.955548, -1.783546, -1.381232, -1~
$ NUM_BUSSTOP_350M_TV <dbl> 1.2064606, 1.6818583, 2.9640046, 3.~
$ NUM_PRISCH_1KM_TV <dbl> -2.194651, -2.514269, -3.404774, -3~
$ storey_31TO33_TV <dbl> 3.542805, 3.173972, 2.453682, 2.428~
$ storey_34TO36_TV <dbl> 3.916115, 3.592424, 2.897057, 2.872~
$ storey_37TO39_TV <dbl> 4.266116, 3.825439, 3.188027, 3.166~
$ storey_40TO42_TV <dbl> 6.920481, 6.874441, 6.673772, 6.674~
$ Local_R2 <dbl> 0.9287239, 0.9310459, 0.9384631, 0.~
$ geometry <POINT [m]> POINT (30942.4 33819.57), POI~
From the above, we can see the gwr outputs such as Local_R2, residuals, coefficient standard error, and observed and predicted y values. We will visualise them in the next section.
The following code chunk checks the statistics of the predicted value yhat using summary function
summary(hdb_final_sf_fixed1$yhat)
Min. 1st Qu. Median Mean 3rd Qu. Max.
236602 354981 406736 433767 465594 1077240
From the summary statistics, the lowest predicted price for 4-room hdb flat is 236602 and the highest predicted price is 1077240. The average is 433767.
Visualization
The following code chunk plots the interactive point maps of Local_R2 using tmap package function.
tmap_mode("view")
tm_shape(mpsz_svy21)+
tm_polygons() +
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +
tm_dots(col = "Local_R2",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(10,15))
tmap_mode("plot")
From the Local_R2 map, we can see that in general, the fixed GWR model predicts well as shown by many red points on the map. Region such as North-east and East are predicted poorly by the fixed GWR model as there are a few yellow points and quite a number of orange point in these two region, and this means that the Local_R2 value is low and range from 0.2 to 0.4, and 0.6 to 0.8 respectively. We would want to analyse these region further.
Summary statistics
The following code chunk checks the statistics of Local_R2 using summary function
summary(hdb_final_sf_fixed1$Local_R2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2329 0.8321 0.9152 0.8661 0.9560 1.0000
From the summary statistics, the min is 0.2329 and the max is 1. The 1st quartile and median is already 0.8321 and 0.915 respectively which is high. The mean is also high at 0.8661. This shows that the Local_R2 are high in general and have some low values as seen from the large difference between the min and first quartile.
North-east
The following code chunk plots an interactive map within the north-east planning area.
tmap_mode("view")
tm_shape(mpsz_svy21[mpsz_svy21$REGION_N=="NORTH-EAST REGION", ])+
tm_polygons() +
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +
tm_dots(col = "Local_R2",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(10,15))
tmap_mode("plot")
In the North-east region, we can see that there is particularly one area with many yellow points which means these area are poorly predicted by the fixed GWR model. If we toggle the mpsz_svy21 layer off and turn on open street map view, we can see that the area is Fernvale. There are also two areas with many orange points. From the open street map view, the areas are Damai, Oasis and Kadaloor in Punggol, and Sengkang South.
East
The following code chunk plots an interactive map within the East planning area.
tmap_mode("view")
tm_shape(mpsz_svy21[mpsz_svy21$REGION_N=="EAST REGION", ])+
tm_polygons() +
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +
tm_dots(col = "Local_R2",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(10,15))
tmap_mode("plot")
In the east planning area, there are two areas with many yellow and orange points. If we toggle the mpsz_svy21 layer off and turn OpenStreetMap on, we can see that these areas are Tampines North, Tampines East, and around Pasir Ris Dr 12.
Visualization
The following code chunk plots the spatial point map of Observed and Predicted Y side by side using tmap_arrange function of tmap package.
ymap <- tm_shape(mpsz_svy21)+
tm_polygons() +
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +
tm_dots(col = "y",
border.col = "gray60",
border.lwd = 1) +
tm_layout(title="Observed Y")
yhatmap <- tm_shape(mpsz_svy21)+
tm_polygons() +
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +
tm_dots(col = "yhat",
border.col = "gray60",
border.lwd = 1) +
tm_layout(title="Predicted Y")
tmap_arrange(ymap, yhatmap)

From the comparison, we can see that it looks the same in general which means that the prediction by the fixed GWR model is good.
Summary statistics
The following code chunk checks the statistics of the predicted value yhat using summary function
summary(hdb_final_sf_fixed1$yhat)
Min. 1st Qu. Median Mean 3rd Qu. Max.
236602 354981 406736 433767 465594 1077240
From the summary statistics, the lowest predicted price for 4-room hdb flat is 236602 and the highest predicted price is 1077240. The average is 433767. This shows that there are only some 4 hdb flats with extremely high predicted price of over 1000000 as seen in the large difference between the third quartile and max resale price.
Visualization
The following code chunk plots the spatial point map of standardized Residual.
tm_shape(mpsz_svy21)+
tm_polygons() +
tmap_options(check.and.fix = TRUE) +
tm_shape(hdb_final_sf_fixed1) +
tm_dots(col = "Stud_residual",
border.col = "gray60",
border.lwd = 1) +
tm_layout(legend.text.size = 1)

From the residual map, we can see that majority of the points are yellow or light green which means that the standardized residual is between -5 to 0, or between 0 to 5. We couldn’t really determine the range because the bin is huge. Hence, we will need to see the summary statistic.
Summary statistics
The following code chunk checks the statistics of the standardized residual using summary function
summary(hdb_final_sf_fixed1$Stud_residual)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-8.53156 -0.55717 0.03430 -0.00703 0.59431 11.84665
From the summary statistics, the min is beyond -1 and the max is beyond 1 and they are outliers because of the huge difference from the 1st quartile and 3rd quartile respectively. However, the mean and median is almost near 0 which shows that majority of the prediction by the fixed GWR model are accurate with a low residual.